readme.txt

I.	Introduction

Documentation for Evans and Garthwaite, �Estimating the 
Heterogeneity in Medical Treatment Intensity,� Review of 
Economics and Statistics, 2012, 94(3), 635-649.

The data for this project is a restricted use data set called 
the �Linked PDD/Birth Cohort File� which was produced by the 
State of California Office of Statewide Health Planning and 
Development (OSHPD).   This product is a research database 
created for the purpose of studying delivery and birth outcomes.   
This linkage utilizes information from the following data sets:
*	California Patient Discharge Data
*	Vital Statistics Birth Certificate Data
*	Vital Statistics Death Certificate Data
*	Vital Statistics Fetal Death File 
*	Vital Statistics Birth Cohort File

The file contains all infants that were born in a given year 
including births that occurred in a California hospital that 
report to OSHPD; births that occurred in a California hospital 
that did not report to OSHPD; and births that occurred outside 
California.  It includes all infants and mothers irrespective of 
whether they were linked to a birth record or not.

Because the data set is restricted use, our agreement with OSHPD prevents us from reposting any of the data here.  However, the original data sets used in this analysis can be obtained from the OSHPD using the contact information below.

Louise Hand
Healthcare Information Resource Center 
400 R Street, Suite 250 
Sacramento, CA 95811-6213 
Tel: (916) 326-3802 
Fax: (916) 324-9242
http://www.oshpd.ca.gov/HID/HIRC/index.html


II.	Outline of data programs

The data from OSPHD come in 6 annual SAS data sets.  We first 
construct the analysis file using SAS Version 9.2, then convert 
the SAS files into STATA data sets and run all the statistical 
models using STATA MP version10.1.

There are three major sets of results in the paper.  One 
includes models with all births pooled together and there is one 
program that generates all these results.  These results are in 
Tables 1-5.  In various points in the paper, we run separate 
models by delivery method (vaginal births or c-sections).  These 
results are in Tables 3 and 5.  

Below is an outline of what programs are used in the analysis

construct_sas_file.sas � This takes the 6 annual files in SAS format and constructs a SAS data file that is then converted into STATA format for analysis.  

reduce_sample.do � The SAS program above produced a SAS data set that is converted into a STATA data set named stata_all_1.dta.  This program takes that data set, deletes some unused variables and constructs 48 dummy variables that indicate the presence of complications during the pregnancy and delivery.  The output from this program is a STATA data set called stata_all_2.dta.

results_fullsample_final.do � Using STATA data set stata_all_2.dta, this produces the results in Tables 1-5 for the full sample regressions and models that use the full sample propensity score.

results_for_csection_only.do � Using STATA data set stata_all_2.dta,this program produces results in Tables 3 and 5 that are for c-section births.

results_for_vaginal_only.do � Using STATA data set stata_all_2.dta, this program produces results in Tables 3 and 5 that are for vaginal births.


III.	Variable definitions for variables used in stata_all_2.dta

The key variables in the STATA data set stata_all_2.dta use a key set of variables that are defined below. Many of these variables are transformed into other variables in the programs and these new variables are defined within those programs.

bthwght	birthweight in grams
trend	monthly trend =1 in first month (January of 2005), 72 in final month (December 2000)
hplhsa	indicator for health service area in California, 1 through 
       	14.
statelaw	dummy variable that equals =1 when the state law was in effect only (August 1997 through December 1997) 
fedlaw	dummy variable that equals 1 in births from January 1999 and 
       	on, the period when the federal law was in effect, =0 
       	otherwise
payer_delivery =1 if the payer was private insurance and 2 if it equaled Medicaid.  All other payers (e.g., uninsured, other type of insurance) were deleted
admmnthi	admit month (birth month) for the infant
admdaym	admit day (1=Sunday, 7=Saturday)
admyri	admit year (1995-2000)
birth_hour	hour of birth on 24 hour clock
sex		1=boy, 2=girl
previous_birth	number of previous births
agegroup	mothers age group, =20 if <20, =25 if >=20, <25, etc.  There
		are 6 groups in total
readmission28i 	dummy variable, =1 if the infant was readmitted to the 		hospital in 28 days, =0 otherwise
deliverysize 	measure of the size of the hospital based on average number of deliveries per year.  1 is <200, 2 is >=200 and <500, 3 is >=500 and <1000, 4 is >=1000 and <1500, 5 is >=1500 and <3000, 6 is > 3000.
gest		estimated weeks of gestation.   999 is missing value code
delivery	delivery type, =1 if vaginal birth, =2 if c-section
_losi		infant�s length of stay in the hospital in days
hospitidm	numeric variable stored in character format that is a unique 
		hospital ID
hospital_owner	categorical variable, 1-11, of ownership status. 
 		1=church, 2=non-profit corporation, 3=no profit other, etc.
deliverysize	categorical variable, 1-6, that measures how many 
		Deliveries are performed per year in the hospital
typebth	=1 if a singleton, =2 if a twin, =3 if a triplet, etc.
hisphm	categorical variable, 1= Hispanic, 2=not Hispanic, 3= unknown 
		Hispanic 
disstat95i	infant�s discharge status.  1= routine, 2=move to acute care within hospital, 3=moved to other care within hospital 4=moved to long term care within hospital, 5-9 are discharged to another medical facility (transfers), 10=left against medical advice, 11= died, 12=home health service.
probl_1	Problems during the pregnancy.  There can be up to 16 2-digit codes per record.  
probl_2	Problems during the pregnancy.  There can be up to 9 2-digit codes per record.
prob(j)	We use the previous two variables and construct a series of dummy variable that equal 1 if there was a specific complication during the pregnancy or the birth.  The variables equal 1 if the condition is present and equals zero otherwise.  The dummy variables for specific conditions are defined below
	prob01	preclamsia	
	prob02	eclampsia
	prob03	hypertension
	prob04	renal disease
	prob05	pyelonephritis  
	prob06	anemia
	prob07	cardiac disease
	prob08	lung disease
	prob09	diabetes
	prob10	rh sensitivity
	prob11	uterine bleeding
	prob12 	Hemoglobinopathy  
	prob13	transport delivery
	prob14	Polyhydramnios
	prob15	incomplete cervix
	prob16	premature labor
	prob17	genital herpes
	prob18	other sexually transmitted disease
	prob19	hepatitis B
	prob20	rubella
	prob21	smoking
	prob22	birth weight greater than 4000 gram births
	prob23	birth weight less than 2500 grams
	prob24	cervival cerclage
	prob25	gestation less than 37 weeks
	prob26 	chronic villus
	prob27	preclampsia delivery
	prob28	eclampsia delivery
	prob29	seizure delivery
	prob30	maternal transfusion during delivery
	prob31	fetopelvic delivery
	prob32	shoulder delivery
	prob33	breech delivery
	prob34 	precipitious delivery
	prob35	prolonged delivery
	prob36	vbac delivery
	prob37	other dysfunctional delivery
	prob38 	premature rupture delivery
	prob39	abruptio placenta delivery
	prob40	placenta previa delivery
	prob41	excessive bleeding during pregnancy
	prob42	herpes delivery
	prob43	sepsis delivery
	prob44	febrile delivery
	prob45	meconium delivery
	prob46	cord prolapse delivery
	prob47	fetal distress delivery
	prob48	anesthetic complication during delivery

meduc		mother�s education in years, 99 is missing value
mrace_new	mom�s race, 1= if white, 2 of black, 3 if American Indian, 
		4 if Asian, 5 if otherm 9 if unknown

